1. To look for the root cause - the dataset is in factor type which is status not a number.
  2. To find out the probabilistic relationship between the symptom error code and the resolution

My caption.

1
Citation Method Output Conclusions
TroubleMiner: Mining network trouble tickets Medem, A. ; Akodjenou, M.-I ; Teixeira, R. 20091 Trouble tickets classification * Automation process on clustering the free text inside the description of the trouble tickets. * Choosing the correct keywords for the analysis Using term frequency distance between trouble tickets and similarity between clusters
Knowledge Discovery from Trouble Ticketing Reports in a Large Telecommunication Company Temprado, Y. ; Garcia, C. ; Molinero, F.J. 2009 Data Mining , Text Mining and Machine Learning , Bayes Net, Naïve Bayes Prediction on the next action of trouble tickets ,Different snapshots were added to the machine learning algorithm for training Combination of multiple method to construct the recommendation , Using Bayesian for prediction
A Bayesian Approach To Stochastic Root Finding 2011 x x x
A Fully Bayesian Approach For Unit Root Testing 2011 x x x
Online Root-Cause Analysis Of Alarms In Discrete Bayesian 2014 x x x
Documents Categorization Based On Bayesian Spanning Tree 2006 x x x
Benefits of a Bayesian Approach to Anomaly and Failure 2009 x x x

List of literature review regarding Bayesian Net :-

1.A real-life application of multi-agent systems for fault diagnosis in the provision of an Internet business service

2.A Bayesian Network approach to diagnosing the root cause of failure

3.sss

My caption.

Process on gathering the dataset

Rules Description
status = ‘Closed’ Dataset must be closed for complete information
network_tt_id is NULL Dataset must be not related to Network Trouble Ticket
trouble ticket type <> PASSIVE Trouble Ticket must related to the Active elements such as routers, switches , modem , etc
installed_date is NOT NULL This field must have value
created_date is NOT NULL This field must have value
closed_date is NOT NULL This field must have value
closed_date is NOT NULL This field must have value
product is NOT NULL This field must have value
sub_product is NOT NULL This field must have value
length description > 10 This field is useful for text analysis
rand() Record selection is in random mode
zone Should selective from different zone , sparse

For sample purpose - selecting dataset from ZONE KEPONG for the analysis due to this zone has the highest records inside the Trouble Ticket dataset.

Documentation - https://github.com/piersharding/dplyrimpaladb

install.packages(c("RJDBC", "devtools", "dplyr"))
devtools::install_github("jwills/dplyrimpaladb")
install.packages("dplyrimpaladb")
  1. Cloudera ‘Impala’, which is a massively parallel processing (MPP) SQL query engine runs natively in Apache Hadoop
  2. Impala’s Place in the Big Data Ecosystem
  3. Flexibility for Big Data Workflow
  4. High-Performance Analytics

Connection to Impala

Basic Impala drivers can be downloaded from https://github.com/Mu-Sigma/RImpala/blob/master/impala-jdbc-cdh5.zip

Below is the components required and how to set the class path for the Impala drivers , RJava , RJDBC and dplyr

suppressWarnings(suppressMessages(library("rJava")))
suppressWarnings(suppressMessages(library("RJDBC")))
suppressWarnings(suppressMessages(library("dplyr")))
suppressWarnings(suppressMessages(library("caret")))
suppressWarnings(suppressMessages(library("corrplot")))
suppressWarnings(suppressMessages(library("lazy")))
suppressWarnings(suppressMessages(library("dplyrimpaladb")))
suppressWarnings(suppressMessages(library("rpart")))
suppressWarnings(suppressMessages(library("DiagrammeR")))

.jaddClassPath(c(list.files(paste(getwd(),"/lib",sep = ''),pattern="jar$",full.names=T)))

.jinit(classpath = c(list.files(paste(getwd(),"/lib",sep = ''),pattern="jar$",full.names=T)))

dplyr.jdbc.classpath = c(list.files(paste(getwd(),"/lib",sep = ''),pattern="jar$",full.names=T))

conn <- src_impaladb(dbname='nova', host='10.54.1.151')
## Loading required package: testthat
## [1] "here:"
## [1] FALSE
result <-  tbl(conn, sql("select zone from nova.nova_trouble_ticket where zone <> 'null' group by zone order by zone limit 1000"))
as.data.frame(result)
##                          zone
## 1               ZONE AIR ITAM
## 2                  ZONE BANGI
## 3                ZONE BANGSAR
## 4                ZONE BANTING
## 5                   ZONE BATU
## 6             ZONE BATU PAHAT
## 7             ZONE BAYAN BARU
## 8                ZONE BINTULU
## 9         ZONE BUKIT ANGGERIK
## 10        ZONE BUKIT MERTAJAM
## 11            ZONE BUKIT RAJA
## 12           ZONE BUTTERWORTH
## 13             ZONE CYBERJAYA
## 14                ZONE GOMBAK
## 15                  ZONE IPOH
## 16                ZONE KAJANG
## 17                ZONE KEPONG
## 18               ZONE KERAMAT
## 19               ZONE KINRARA
## 20            ZONE KL CENTRAL
## 21                 ZONE KLANG
## 22 ZONE KOTA KINABALU SELATAN
## 23   ZONE KOTA KINABALU UTARA
## 24               ZONE KUCHING
## 25                 ZONE KULIM
## 26              ZONE LANGKAWI
## 27                ZONE MALURI
## 28          ZONE MELAKA UTARA
## 29                  ZONE MIRI
## 30     ZONE N. SEMBILAN UTARA
## 31                ZONE PANDAN
## 32               ZONE PELANGI
## 33                ZONE PERLIS
## 34         ZONE PETALING JAYA
## 35               ZONE PUCHONG
## 36         ZONE SEBERANG JAYA
## 37                 ZONE SENAI
## 38             ZONE SG PETANI
## 39             ZONE SHAH ALAM
## 40                  ZONE SIBU
## 41        ZONE SKUDAI PONTIAN
## 42               ZONE STAMPIN
## 43           ZONE SUBANG JAYA
## 44        ZONE TAMAN PETALING
## 45                ZONE TAMPOI
## 46                   ZONE TAR
## 47                 ZONE TASEK
## 48          ZONE TASIK AMPANG
## 49                   ZONE TDI
## 50           ZONE TELUK INTAN
## 51    ZONE TERENGGANU SELATAN
## 52              ZONE TERUNTUM
result <-  tbl(conn, sql("select * from nova_trouble_ticket where zone <> 'null' limit 1"))
as.data.frame(apply(as.data.frame(result),2,class))
##                        apply(as.data.frame(result), 2, class)
## tt_row_id                                           character
## tt_num                                              character
## tt_type                                             character
## tt_sub_type                                         character
## status                                              character
## severity                                            character
## important_message                                   character
## appointment_flag                                    character
## nova_account_name                                   character
## nova_subscriber_num                                 character
## nova_account_num                                    character
## package_row_id                                      character
## created_by                                          character
## category                                            character
## symptom_error_code                                  character
## priority                                            character
## product                                             character
## sub_product                                         character
## package_name                                        character
## network_tt_id                                       character
## swap_order_num                                      character
## cause_category                                      character
## cause_code                                          character
## resolution_code                                     character
## closure_category                                    character
## resolution_team                                     character
## service_affected                                    character
## service_order_num                                   character
## btu_type                                            character
## owner                                               character
## owner_name                                          character
## group_owner                                         character
## owner_position                                      character
## btu_platform                                        character
## dp_location                                         character
## created_date                                        character
## pending_verify_date                                 character
## closed_by                                           character
## closed_date                                         character
## source                                              character
## installed_date                                      character
## description                                         character
## repeat_ticket_count                                 character
## follow_up_ticket_count                              character
## fdp_device_name                                     character
## fdp_site_name                                       character
## olt_site_name                                       character
## exchange                                            character
## timestamp                                           character
## contact_id                                          character
## contact_name                                        character
## contact_office_phone                                character
## contact_mobile_phone                                character
## contact_home_phone                                  character
## contact_email_addr                                  character
## due_date                                            character
## part_num                                            character
## network_layer                                       character
## network_row_id                                      character
## asset_id                                            character
## ptt                                                 character
## zone                                                character
## service_point_id                                    character

Getting the dataset from Impala

Sample dataset - Selection trouble tickets only from Zone Kepong. The SQL is define by :-

Zone Kepong contains very rich information especially for the textual analysis and also one of the largest composition of the cause code & the resolution code which is good for the supervised learning.

Rules Description
a.status like ‘%Closed%’ Dataset must be closed for complete information
network_tt_id = ‘null’ Dataset must be not related to Network Trouble Ticket
trouble ticket type <> PASSIVE Trouble Ticket must related to the Active elements such as routers, switches , modem , etc. Excluding for now if related to the 3rd party causes , customer behavior and Passive elements
installed_date is NOT NULL This field must have value
created_date is NOT NULL This field must have value
closed_date is NOT NULL This field must have value
closed_date is NOT NULL This field must have value
product is NOT NULL This field must have value
sub_product is NOT NULL This field must have value
length description > 10 This field is useful for text analysis
rand() Record selection is in random mode
zone Should selective from different zone , sparse control

Generated SQL :-

select * from nova_trouble_ticket a join active_code b on (trim(a.cause_code) = trim(b.cause_code)) join exchange_zone c ON (trim(a.exchange)=trim(c.building_id)) and (b.code <> 'PASSIVE' ) where c.zone_name like '%ZONE KEPONG%' and a.status like '%Closed%'  and length(a.cause_category) > 1  and length(a.created_date) > 6 and length(a.closed_date) > 6 and length(a.installed_date) > 6 and a.package_name not like '%null%' and a.product not like '%null%' and a.sub_product not like '%null%'  and  length(a.description) > 10 and network_tt_id = 'null' order by rand() limit 10000 "

Dataset filtering

Removing non-related fields such as trouble ticket key , trouble ticket number , trouble ticket date etc.

conn <- src_impaladb(dbname='nova', host='10.54.1.151')
## [1] "here:"
## [1] FALSE
result <-  tbl(conn, sql("select a.tt_row_id,a.tt_num,a.tt_type,a.tt_sub_type,a.status,a.severity,a.important_message,a.appointment_flag,a.nova_account_name,a.nova_subscriber_num,a.nova_account_num,a.package_row_id,a.created_by,a.category,a.symptom_error_code,a.priority,a.product,a.sub_product,a.package_name,a.network_tt_id,a.swap_order_num,a.cause_category,a.cause_code,a.resolution_code,a.closure_category,a.resolution_team,a.service_affected,a.service_order_num,a.btu_type,a.owner,a.owner_name,a.group_owner,a.owner_position,a.btu_platform,a.dp_location,a.created_date,a.pending_verify_date,a.closed_by,a.closed_date,a.source,a.installed_date,a.description,a.repeat_ticket_count,a.follow_up_ticket_count,a.fdp_device_name,
a.fdp_site_name,a.olt_site_name,a.exchange,a.`timestamp`,a.contact_id,a.contact_name,a.contact_office_phone,a.contact_mobile_phone,a.contact_home_phone,a.contact_email_addr,a.due_date,a.part_num,a.network_layer,a.network_row_id,a.asset_id,a.ptt,a.zone,a.service_point_id , c.zone_name, c.district,c.state, c.region from nova_trouble_ticket a join active_code b on (trim(a.cause_code) = trim(b.cause_code)) join exchange_zone c ON (trim(a.exchange)=trim(c.building_id)) and (b.code <> 'PASSIVE' ) where c.zone_name like '%ZONE KEPONG%' and a.status like '%Closed%' and  length(a.cause_category) > 1 and length(a.created_date) > 6 and length(a.closed_date) > 6 and length(a.installed_date) > 6 and a.package_name not like '%null%' and a.product not like '%null%' and a.sub_product not like '%null%' and  length(a.description) > 10 and a.network_tt_id = 'null' order by rand() limit 100"))

result <- as.data.frame(result)

Close the connection from Impala

x <- conn$con
class(x) <- c('JDBCConnection')
dbDisconnect(x)
## [1] TRUE

Save the class as the data.frame

df <- as.data.frame(result)
df$contact_name <- NULL
df$contact_home_phone <- NULL
df$contact_email_addr <- NULL
df$contact_office_phone <- NULL
df$contact_mobile_phone <- NULL
df$`tt_row_id` <- NULL
df$`tt_num` <- NULL
df$tt_type <- NULL
df$`created_date` <- NULL
df$`closed_date` <- NULL
df$`installed_date` <- NULL
df$timestamp <- NULL
df$service_point_id <- NULL
df$contact_id <- NULL
df$owner_position <- NULL
df$tt_sub_type <- NULL
df$severity <- NULL
df$status <- NULL
df$important_message <- NULL
df$network_tt_id <- NULL
df$swap_order_num <- NULL
df$appointment_flag <- NULL
df$nova_account_name <- NULL
df$nova_subscriber_num <- NULL
df$nova_account_num <- NULL
df$repeat_ticket_count <- NULL
df$follow_up_ticket_count <- NULL
df$service_order_num <- NULL
df$source <- NULL
df$owner_name <- NULL
df$description <- NULL
df$due_date <- NULL
df$part_num <- NULL
df$zone <- NULL
df$ptt <- NULL
df$asset_id <- NULL
df$network_layer <- NULL
df$network_row_id <- NULL
df$pending_verify_date <- NULL
df$package_row_id <- NULL
df$priority <- NULL
summary(df)
##   created_by          category         symptom_error_code
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    product          sub_product        package_name      
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  cause_category      cause_code        resolution_code   
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  closure_category   resolution_team    service_affected  
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    btu_type            owner           group_owner       
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  btu_platform       dp_location         closed_by        
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  fdp_device_name    fdp_site_name      olt_site_name     
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##    exchange          zone_name           district        
##  Length:100         Length:100         Length:100        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     state              region         
##  Length:100         Length:100        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character

Looping the columns name and rename it to [column name]+1 as the factor name

for(i in names(df)){

  num <- as.numeric(as.factor(df[,i]))-1
  df <- cbind(df,num)
  names(df)[names(df)=="num"] <- paste(names(df[i]),"_factor",sep = "")
  print(paste(names(df[i]),"1",sep = ""))
}
## [1] "created_by1"
## [1] "category1"
## [1] "symptom_error_code1"
## [1] "product1"
## [1] "sub_product1"
## [1] "package_name1"
## [1] "cause_category1"
## [1] "cause_code1"
## [1] "resolution_code1"
## [1] "closure_category1"
## [1] "resolution_team1"
## [1] "service_affected1"
## [1] "btu_type1"
## [1] "owner1"
## [1] "group_owner1"
## [1] "btu_platform1"
## [1] "dp_location1"
## [1] "closed_by1"
## [1] "fdp_device_name1"
## [1] "fdp_site_name1"
## [1] "olt_site_name1"
## [1] "exchange1"
## [1] "zone_name1"
## [1] "district1"
## [1] "state1"
## [1] "region1"
df <- df[27:52]
names(df)
##  [1] "created_by_factor"         "category_factor"          
##  [3] "symptom_error_code_factor" "product_factor"           
##  [5] "sub_product_factor"        "package_name_factor"      
##  [7] "cause_category_factor"     "cause_code_factor"        
##  [9] "resolution_code_factor"    "closure_category_factor"  
## [11] "resolution_team_factor"    "service_affected_factor"  
## [13] "btu_type_factor"           "owner_factor"             
## [15] "group_owner_factor"        "btu_platform_factor"      
## [17] "dp_location_factor"        "closed_by_factor"         
## [19] "fdp_device_name_factor"    "fdp_site_name_factor"     
## [21] "olt_site_name_factor"      "exchange_factor"          
## [23] "zone_name_factor"          "district_factor"          
## [25] "state_factor"              "region_factor"

Remove the predictors column which might have one unique value which can leads to zero variance result

The list below is non-zero variance variables

df <- df[,-nearZeroVar(df)] 
names(df)
##  [1] "created_by_factor"         "category_factor"          
##  [3] "symptom_error_code_factor" "product_factor"           
##  [5] "sub_product_factor"        "package_name_factor"      
##  [7] "cause_category_factor"     "cause_code_factor"        
##  [9] "resolution_code_factor"    "closure_category_factor"  
## [11] "resolution_team_factor"    "service_affected_factor"  
## [13] "btu_type_factor"           "owner_factor"             
## [15] "group_owner_factor"        "btu_platform_factor"      
## [17] "dp_location_factor"        "closed_by_factor"         
## [19] "fdp_device_name_factor"    "fdp_site_name_factor"     
## [21] "olt_site_name_factor"      "exchange_factor"

Find the correlation between the variables using Pearson.

correlations <- cor(df, use="pairwise.complete.obs", method="pearson")
print(correlations)
##                           created_by_factor category_factor
## created_by_factor              1.000000e+00    -0.103518789
## category_factor               -1.035188e-01     1.000000000
## symptom_error_code_factor     -6.157588e-02     0.093625736
## product_factor                 5.092945e-02    -0.004756966
## sub_product_factor             1.647699e-01    -0.070217371
## package_name_factor            6.622391e-02    -0.033616222
## cause_category_factor          3.574845e-02     0.018413519
## cause_code_factor              1.460979e-02     0.049236852
## resolution_code_factor        -3.524580e-02    -0.056954507
## closure_category_factor        3.942088e-02     0.093577125
## resolution_team_factor         4.978484e-02     0.196889896
## service_affected_factor       -7.054111e-02    -0.332673920
## btu_type_factor               -6.849342e-02    -0.044298391
## owner_factor                   4.225674e-01    -0.001304302
## group_owner_factor             3.802286e-02     0.175218963
## btu_platform_factor           -1.985990e-02    -0.121242359
## dp_location_factor            -8.174213e-05     0.090481946
## closed_by_factor               4.225674e-01    -0.001304302
## fdp_device_name_factor        -1.955220e-02     0.030928924
## fdp_site_name_factor          -1.955220e-02     0.030928924
## olt_site_name_factor          -2.774736e-03    -0.002208588
## exchange_factor                4.386255e-02     0.050170300
##                           symptom_error_code_factor product_factor
## created_by_factor                     -6.157588e-02    0.050929453
## category_factor                        9.362574e-02   -0.004756966
## symptom_error_code_factor              1.000000e+00   -0.092192502
## product_factor                        -9.219250e-02    1.000000000
## sub_product_factor                    -4.707342e-02    0.622173565
## package_name_factor                    8.139102e-21    0.827404294
## cause_category_factor                  1.652147e-01    0.169942179
## cause_code_factor                      1.332267e-01    0.019968619
## resolution_code_factor                 8.237376e-02    0.265439233
## closure_category_factor               -1.033376e-02   -0.057229432
## resolution_team_factor                 1.051736e-01   -0.326132300
## service_affected_factor               -2.777790e-01    0.246428797
## btu_type_factor                       -4.769590e-02    0.217192576
## owner_factor                           4.493879e-02   -0.010655232
## group_owner_factor                     6.987317e-03   -0.250018447
## btu_platform_factor                    0.000000e+00    0.159181823
## dp_location_factor                     9.742155e-02   -0.036298712
## closed_by_factor                       4.493879e-02   -0.010655232
## fdp_device_name_factor                -6.792751e-02    0.033258575
## fdp_site_name_factor                  -6.792751e-02    0.033258575
## olt_site_name_factor                  -8.481453e-02    0.035622065
## exchange_factor                       -6.752270e-02    0.049916894
##                           sub_product_factor package_name_factor
## created_by_factor                 0.16476987        6.622391e-02
## category_factor                  -0.07021737       -3.361622e-02
## symptom_error_code_factor        -0.04707342        8.139102e-21
## product_factor                    0.62217357        8.274043e-01
## sub_product_factor                1.00000000        6.600184e-01
## package_name_factor               0.66001838        1.000000e+00
## cause_category_factor             0.03649430        2.689180e-02
## cause_code_factor                 0.13333300        1.594497e-03
## resolution_code_factor            0.13079713        1.480758e-01
## closure_category_factor          -0.12824227       -9.866713e-02
## resolution_team_factor           -0.20900820       -2.148615e-01
## service_affected_factor           0.14512655        1.654494e-01
## btu_type_factor                   0.13717818        8.594294e-02
## owner_factor                      0.01375153        3.725046e-02
## group_owner_factor               -0.11187639       -1.622350e-01
## btu_platform_factor               0.13750598        7.485758e-02
## dp_location_factor                0.05008800        4.885970e-02
## closed_by_factor                  0.01375153        3.725046e-02
## fdp_device_name_factor           -0.03721062       -5.055878e-02
## fdp_site_name_factor             -0.03721062       -5.055878e-02
## olt_site_name_factor             -0.03907852       -6.910593e-02
## exchange_factor                  -0.05742457        1.436679e-03
##                           cause_category_factor cause_code_factor
## created_by_factor                   0.035748449       0.014609787
## category_factor                     0.018413519       0.049236852
## symptom_error_code_factor           0.165214679       0.133226689
## product_factor                      0.169942179       0.019968619
## sub_product_factor                  0.036494296       0.133332998
## package_name_factor                 0.026891804       0.001594497
## cause_category_factor               1.000000000       0.289349037
## cause_code_factor                   0.289349037       1.000000000
## resolution_code_factor              0.424575076       0.131475713
## closure_category_factor             0.145313431      -0.104022198
## resolution_team_factor             -0.327183873       0.109750791
## service_affected_factor             0.037499062       0.035906434
## btu_type_factor                    -0.005276490      -0.119373133
## owner_factor                       -0.177254643      -0.107275894
## group_owner_factor                 -0.047123829      -0.027549588
## btu_platform_factor                -0.005635697      -0.138019471
## dp_location_factor                  0.037863902       0.194969083
## closed_by_factor                   -0.177254643      -0.107275894
## fdp_device_name_factor             -0.110953694      -0.378169436
## fdp_site_name_factor               -0.110953694      -0.378169436
## olt_site_name_factor               -0.082350746      -0.407144023
## exchange_factor                    -0.107455667      -0.411389901
##                           resolution_code_factor closure_category_factor
## created_by_factor                    -0.03524580              0.03942088
## category_factor                      -0.05695451              0.09357713
## symptom_error_code_factor             0.08237376             -0.01033376
## product_factor                        0.26543923             -0.05722943
## sub_product_factor                    0.13079713             -0.12824227
## package_name_factor                   0.14807580             -0.09866713
## cause_category_factor                 0.42457508              0.14531343
## cause_code_factor                     0.13147571             -0.10402220
## resolution_code_factor                1.00000000              0.08181496
## closure_category_factor               0.08181496              1.00000000
## resolution_team_factor               -0.46570023             -0.16239538
## service_affected_factor              -0.01380502              0.17534826
## btu_type_factor                       0.32825269              0.01711272
## owner_factor                         -0.21393749             -0.14933430
## group_owner_factor                   -0.24367602              0.17405501
## btu_platform_factor                   0.32169090              0.03459707
## dp_location_factor                   -0.24084516             -0.07656520
## closed_by_factor                     -0.21393749             -0.14933430
## fdp_device_name_factor                0.22148871              0.01076765
## fdp_site_name_factor                  0.22148871              0.01076765
## olt_site_name_factor                  0.25584205              0.02624574
## exchange_factor                       0.14321242             -0.01938107
##                           resolution_team_factor service_affected_factor
## created_by_factor                     0.04978484            -0.070541108
## category_factor                       0.19688990            -0.332673920
## symptom_error_code_factor             0.10517361            -0.277778965
## product_factor                       -0.32613230             0.246428797
## sub_product_factor                   -0.20900820             0.145126554
## package_name_factor                  -0.21486148             0.165449403
## cause_category_factor                -0.32718387             0.037499062
## cause_code_factor                     0.10975079             0.035906434
## resolution_code_factor               -0.46570023            -0.013805023
## closure_category_factor              -0.16239538             0.175348258
## resolution_team_factor                1.00000000            -0.174459776
## service_affected_factor              -0.17445978             1.000000000
## btu_type_factor                      -0.23015013             0.025075320
## owner_factor                          0.30597762            -0.191518467
## group_owner_factor                    0.08907496            -0.108683885
## btu_platform_factor                  -0.18976157             0.072958874
## dp_location_factor                    0.15548815            -0.120601390
## closed_by_factor                      0.30597762            -0.191518467
## fdp_device_name_factor               -0.13273053             0.004493253
## fdp_site_name_factor                 -0.13273053             0.004493253
## olt_site_name_factor                 -0.13371539            -0.001264552
## exchange_factor                      -0.07617183             0.045046915
##                           btu_type_factor owner_factor group_owner_factor
## created_by_factor             -0.06849342  0.422567422        0.038022862
## category_factor               -0.04429839 -0.001304302        0.175218963
## symptom_error_code_factor     -0.04769590  0.044938790        0.006987317
## product_factor                 0.21719258 -0.010655232       -0.250018447
## sub_product_factor             0.13717818  0.013751528       -0.111876388
## package_name_factor            0.08594294  0.037250458       -0.162235001
## cause_category_factor         -0.00527649 -0.177254643       -0.047123829
## cause_code_factor             -0.11937313 -0.107275894       -0.027549588
## resolution_code_factor         0.32825269 -0.213937489       -0.243676015
## closure_category_factor        0.01711272 -0.149334300        0.174055014
## resolution_team_factor        -0.23015013  0.305977624        0.089074964
## service_affected_factor        0.02507532 -0.191518467       -0.108683885
## btu_type_factor                1.00000000 -0.114728843       -0.210344389
## owner_factor                  -0.11472884  1.000000000        0.312657631
## group_owner_factor            -0.21034439  0.312657631        1.000000000
## btu_platform_factor            0.91743142 -0.140281402       -0.235698575
## dp_location_factor            -0.76330054  0.089358585        0.179428745
## closed_by_factor              -0.11472884  1.000000000        0.312657631
## fdp_device_name_factor         0.69773957 -0.102563516       -0.123760434
## fdp_site_name_factor           0.69773957 -0.102563516       -0.123760434
## olt_site_name_factor           0.69504545 -0.093581969       -0.151449059
## exchange_factor                0.40493981 -0.073246090       -0.083543009
##                           btu_platform_factor dp_location_factor
## created_by_factor                -0.019859901      -8.174213e-05
## category_factor                  -0.121242359       9.048195e-02
## symptom_error_code_factor         0.000000000       9.742155e-02
## product_factor                    0.159181823      -3.629871e-02
## sub_product_factor                0.137505980       5.008800e-02
## package_name_factor               0.074857575       4.885970e-02
## cause_category_factor            -0.005635697       3.786390e-02
## cause_code_factor                -0.138019471       1.949691e-01
## resolution_code_factor            0.321690898      -2.408452e-01
## closure_category_factor           0.034597069      -7.656520e-02
## resolution_team_factor           -0.189761572       1.554882e-01
## service_affected_factor           0.072958874      -1.206014e-01
## btu_type_factor                   0.917431418      -7.633005e-01
## owner_factor                     -0.140281402       8.935859e-02
## group_owner_factor               -0.235698575       1.794287e-01
## btu_platform_factor               1.000000000      -7.786325e-01
## dp_location_factor               -0.778632539       1.000000e+00
## closed_by_factor                 -0.140281402       8.935859e-02
## fdp_device_name_factor            0.679829117      -6.488113e-01
## fdp_site_name_factor              0.679829117      -6.488113e-01
## olt_site_name_factor              0.691038611      -6.205678e-01
## exchange_factor                   0.432506881      -3.629443e-01
##                           closed_by_factor fdp_device_name_factor
## created_by_factor              0.422567422           -0.019552198
## category_factor               -0.001304302            0.030928924
## symptom_error_code_factor      0.044938790           -0.067927511
## product_factor                -0.010655232            0.033258575
## sub_product_factor             0.013751528           -0.037210624
## package_name_factor            0.037250458           -0.050558778
## cause_category_factor         -0.177254643           -0.110953694
## cause_code_factor             -0.107275894           -0.378169436
## resolution_code_factor        -0.213937489            0.221488714
## closure_category_factor       -0.149334300            0.010767650
## resolution_team_factor         0.305977624           -0.132730527
## service_affected_factor       -0.191518467            0.004493253
## btu_type_factor               -0.114728843            0.697739570
## owner_factor                   1.000000000           -0.102563516
## group_owner_factor             0.312657631           -0.123760434
## btu_platform_factor           -0.140281402            0.679829117
## dp_location_factor             0.089358585           -0.648811265
## closed_by_factor               1.000000000           -0.102563516
## fdp_device_name_factor        -0.102563516            1.000000000
## fdp_site_name_factor          -0.102563516            1.000000000
## olt_site_name_factor          -0.093581969            0.967847038
## exchange_factor               -0.073246090            0.883823338
##                           fdp_site_name_factor olt_site_name_factor
## created_by_factor                 -0.019552198         -0.002774736
## category_factor                    0.030928924         -0.002208588
## symptom_error_code_factor         -0.067927511         -0.084814530
## product_factor                     0.033258575          0.035622065
## sub_product_factor                -0.037210624         -0.039078518
## package_name_factor               -0.050558778         -0.069105930
## cause_category_factor             -0.110953694         -0.082350746
## cause_code_factor                 -0.378169436         -0.407144023
## resolution_code_factor             0.221488714          0.255842048
## closure_category_factor            0.010767650          0.026245741
## resolution_team_factor            -0.132730527         -0.133715385
## service_affected_factor            0.004493253         -0.001264552
## btu_type_factor                    0.697739570          0.695045446
## owner_factor                      -0.102563516         -0.093581969
## group_owner_factor                -0.123760434         -0.151449059
## btu_platform_factor                0.679829117          0.691038611
## dp_location_factor                -0.648811265         -0.620567819
## closed_by_factor                  -0.102563516         -0.093581969
## fdp_device_name_factor             1.000000000          0.967847038
## fdp_site_name_factor               1.000000000          0.967847038
## olt_site_name_factor               0.967847038          1.000000000
## exchange_factor                    0.883823338          0.865384094
##                           exchange_factor
## created_by_factor             0.043862550
## category_factor               0.050170300
## symptom_error_code_factor    -0.067522699
## product_factor                0.049916894
## sub_product_factor           -0.057424565
## package_name_factor           0.001436679
## cause_category_factor        -0.107455667
## cause_code_factor            -0.411389901
## resolution_code_factor        0.143212419
## closure_category_factor      -0.019381066
## resolution_team_factor       -0.076171825
## service_affected_factor       0.045046915
## btu_type_factor               0.404939806
## owner_factor                 -0.073246090
## group_owner_factor           -0.083543009
## btu_platform_factor           0.432506881
## dp_location_factor           -0.362944340
## closed_by_factor             -0.073246090
## fdp_device_name_factor        0.883823338
## fdp_site_name_factor          0.883823338
## olt_site_name_factor          0.865384094
## exchange_factor               1.000000000

Find the highest correlated variables.

Rules Description
- +.70 or higher Very strong relationship
- +.40 to +.69 Strong positive relationship
- +.30 to +.39 Moderate relationship
- +.20 to +.29 weak relationship
- +.01 to +.19 No or negligible relationship
# Choose 0.7 Very strong relationship
highlyCorrelated <- findCorrelation(correlations, 0.7 ,verbose = FALSE,names = TRUE)
highlyCorrelated
## [1] "fdp_device_name_factor" "fdp_site_name_factor"  
## [3] "olt_site_name_factor"   "btu_platform_factor"   
## [5] "btu_type_factor"        "owner_factor"          
## [7] "product_factor"

Summary of the correlated variables.

summary(correlations)
##  created_by_factor  category_factor     symptom_error_code_factor
##  Min.   :-0.10352   Min.   :-0.332674   Min.   :-0.27778         
##  1st Qu.:-0.01978   1st Qu.:-0.041628   1st Qu.:-0.06604         
##  Median : 0.02518   Median : 0.008555   Median : 0.00000         
##  Mean   : 0.08851   Mean   : 0.048063   Mean   : 0.04314         
##  3rd Qu.: 0.05064   3rd Qu.: 0.080404   3rd Qu.: 0.09081         
##  Max.   : 1.00000   Max.   : 1.000000   Max.   : 1.00000         
##  product_factor     sub_product_factor package_name_factor
##  Min.   :-0.32613   Min.   :-0.20901   Min.   :-0.21486   
##  1st Qu.:-0.01066   1st Qu.:-0.04507   1st Qu.:-0.04632   
##  Median : 0.03444   Median : 0.02512   Median : 0.03207   
##  Mean   : 0.13376   Mean   : 0.11398   Mean   : 0.11371   
##  3rd Qu.: 0.20538   3rd Qu.: 0.13742   3rd Qu.: 0.08317   
##  Max.   : 1.00000   Max.   : 1.00000   Max.   : 1.00000   
##  cause_category_factor cause_code_factor   resolution_code_factor
##  Min.   :-0.32718      Min.   :-0.411390   Min.   :-0.46570      
##  1st Qu.:-0.10118      1st Qu.:-0.116349   1st Qu.:-0.05153      
##  Median : 0.02265      Median : 0.008102   Median : 0.13114      
##  Mean   : 0.05618      Mean   :-0.002953   Mean   : 0.10329      
##  3rd Qu.: 0.11845      3rd Qu.: 0.126045   3rd Qu.: 0.24725      
##  Max.   : 1.00000      Max.   : 1.000000   Max.   : 1.00000      
##  closure_category_factor resolution_team_factor service_affected_factor
##  Min.   :-0.16240        Min.   :-0.46570       Min.   :-0.332674      
##  1st Qu.:-0.09314        1st Qu.:-0.20420       1st Qu.:-0.117622      
##  Median : 0.01077        Median :-0.13273       Median : 0.004493      
##  Mean   : 0.03880        Mean   :-0.02077       Mean   : 0.021590      
##  3rd Qu.: 0.07122        3rd Qu.: 0.10861       3rd Qu.: 0.065981      
##  Max.   : 1.00000        Max.   : 1.00000       Max.   : 1.000000      
##  btu_type_factor     owner_factor      group_owner_factor
##  Min.   :-0.76330   Min.   :-0.21394   Min.   :-0.25002  
##  1st Qu.:-0.10317   1st Qu.:-0.11287   1st Qu.:-0.14453  
##  Median : 0.02109   Median :-0.04195   Median :-0.06533  
##  Mean   : 0.15933   Mean   : 0.07947   Mean   : 0.01856  
##  3rd Qu.: 0.38577   3rd Qu.: 0.07825   3rd Qu.: 0.15281  
##  Max.   : 1.00000   Max.   : 1.00000   Max.   : 1.00000  
##  btu_platform_factor dp_location_factor closed_by_factor  
##  Min.   :-0.77863    Min.   :-0.77863   Min.   :-0.21394  
##  1st Qu.:-0.13383    1st Qu.:-0.33242   1st Qu.:-0.11287  
##  Median : 0.05378    Median : 0.01889   Median :-0.04195  
##  Mean   : 0.15600    Mean   :-0.10292   Mean   : 0.07947  
##  3rd Qu.: 0.40480    3rd Qu.: 0.09020   3rd Qu.: 0.07825  
##  Max.   : 1.00000    Max.   : 1.00000   Max.   : 1.00000  
##  fdp_device_name_factor fdp_site_name_factor olt_site_name_factor
##  Min.   :-0.648811      Min.   :-0.648811    Min.   :-0.620568   
##  1st Qu.:-0.102564      1st Qu.:-0.102564    1st Qu.:-0.091390   
##  Median :-0.007529      Median :-0.007529    Median :-0.002492   
##  Mean   : 0.170699      Mean   : 0.170699    Mean   : 0.169238   
##  3rd Qu.: 0.565244      3rd Qu.: 0.565244    3rd Qu.: 0.582240   
##  Max.   : 1.000000      Max.   : 1.000000    Max.   : 1.000000   
##  exchange_factor   
##  Min.   :-0.41139  
##  1st Qu.:-0.07325  
##  Median : 0.02265  
##  Mean   : 0.15781  
##  3rd Qu.: 0.33951  
##  Max.   : 1.00000

Plot correlated variables.

png(height=1200, width=1200, pointsize=15, file="corrplot.png")
corrplot(correlations, method = "number",tl.cex = 0.9 ,addCoef.col="grey", order = "AOE")
dev.off()
## png 
##   2

My caption.

Feature selection process to confirm which variable does become the independent and resolution code is the dependent variable via GBM (Stochastic Gradient Boosting).

List of other available model - http://topepo.github.io/caret/modelList.html

set.seed(777)
suppressWarnings(suppressMessages(library(mlbench)))
control <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
model <-
train(
resolution_code_factor ~ ., data = df, method = "gbm", preProcess = "scale", trControl =
control , verbose = FALSE
)
## Loading required package: gbm
## Warning: package 'gbm' was built under R version 3.2.2
## Loading required package: survival
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster
## 
## Loading required package: splines
## Loading required package: parallel
## Loaded gbm 2.1.1
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## 
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
importance <- varImp(model, scale = TRUE)
print(importance)
## gbm variable importance
## 
##   only 20 most important variables shown (out of 21)
## 
##                          Overall
## cause_code_factor       100.0000
## cause_category_factor    49.9421
## resolution_team_factor   31.3111
## created_by_factor        19.3383
## service_affected_factor  13.2905
## owner_factor             13.2559
## fdp_device_name_factor   11.1100
## btu_type_factor           6.3354
## product_factor            4.9427
## dp_location_factor        4.5966
## closure_category_factor   1.7504
## btu_platform_factor       1.5958
## package_name_factor       1.1886
## group_owner_factor        0.9891
## olt_site_name_factor      0.7943
## sub_product_factor        0.2145
## fdp_site_name_factor      0.0000
## exchange_factor           0.0000
## category_factor           0.0000
## closed_by_factor          0.0000
plot(importance)

My caption.

So far , the main variables or factors found are :-

as based on the list the importance plot after the tuning and fitting predictive model process.

My caption.

After the status = Closed

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

sss

My caption.

Data Cleansing & Transformation

workflow <- print(grViz("
digraph {
# graph attributes
graph [overlap = true]

node [shape = box,
        fontname = Calibri,
        color = blue]

# edge statements
GET_Zone_Distinct_List -> RETURN_Zone_List -> LOOP_Zone_List -> GENERATE_SQL_filter_by_zone; DESIGN_Base_SQL ->
APPLY_record_filter -> RECEIVE_Zone_Paramater -> GENERATE_SQL_filter_by_zone ->
ORDER_records_in_random_mode -> Limit_records_per_zone -> Download_dataset -> Load_in_R->COVERT_dataset_as_dataframe->REMOVE_unrelated_fields;
Download_dataset-> SAVE_Trouble_Ticket_No_As_Reference;
Download_dataset-> SAVE_Description_Text_As_Reference;
}
"))
workflow

  1. dwdw
  2. wdwwd

Previously have been mentioned here by applying SQL and condition rules here


Journal 1 - Paper

Still in progress.

My caption.

My caption.

My caption.

My caption.

My caption.


Journal 2 - Paper

Still in progress.

My caption.

My caption.

My caption.


Journal 3 - Paper

Still in progress.

My caption.

My caption.

My caption.

My caption.